Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition

نویسندگان

چکیده

Along with automatic speech recognition, many researchers have been actively studying emotion since information is as crucial the textual for effective interactions. Emotion can be divided into categorical and dimensional emotion. Although widely used, emotion, typically represented arousal valence, provide more detailed on emotional states. Therefore, in this paper, we propose a Conformer-based model valence recognition. Our uses Conformer an encoder, fully connected layer decoder, statistical pooling layers connector. In addition, adopted multi-task learning multi-feature combination, which showed remarkable performance recognition time-series analysis, respectively. The proposed achieves state-of-the-art accuracy of 70.0 ± 1.5% terms unweighted IEMOCAP dataset.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition

Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in Deep Learning have supposed a significant breakthrough in this topic, strong changes in pose, orientation and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly, and current state-of-the-art deep learning algorithms canno...

متن کامل

Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network

Estimating continuous emotional states from speech as a function of time has traditionally been framed as a regression problem. In this paper, we present a novel approach that moves the problem into the classification domain by discretizing the training labels at different resolutions. We employ a multi-task deep bidirectional long-short term memory (BLSTM) recurrent neural network (RNN) traine...

متن کامل

Low-Order Multi-Level Features for Speech Emotion Recognition

Various feature selection and classification schemes were proposed to improve efficiency of speech emotion classification and recognition. In this paper we propose multi-level organization of classification process and features. The main idea is to perform classification of speech emotions in step-by-step manner using different feature subsets for every step. We applied the maximal efficiency f...

متن کامل

Feature Transfer Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition condi...

متن کامل

Multi-task learning deep neural networks for speech feature denoising

Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Symmetry

سال: 2022

ISSN: ['0865-4824', '2226-1877']

DOI: https://doi.org/10.3390/sym14071428